-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Spider

A spider is a program that travels the WWW and compiles the information it uncovers into a database. Spiders "crawl" across the Web by using the links in one html document to access another document. The spider searches its database, rather than querying the Web, each time it is asked a question. Not all spiders are created equal. It is worth noting how large a spider's database is, how frequently the spider conducts searches, and whether or not the spider is restricted to searching a fixed set of Web servers.

Try a variety of different spiders in order to discover one that brings up more useful results.

URLs:

WWW Tools and Guides Search Forms Search Engines Web Page Lists Posting Services HTML Guides: "A quick stop for links to search engines"
Lycos: Lycos is the deservedly popular Web search engine located at Carnegie Mellon. The word "lycos" is taken from Lycosidae, the name for a large ground spider which captures its prey rather than trapping it in a Web. It is a favorite search engine of many because it includes text from each hit, which allows the user to preview the usefulness of the results. (Many search engines link directly to a URL, so each result must be visited in order to see what is there.)
Open Text Index: "Over 15 million links. All instantly searchable. All constantly updated." Open text is the software which Yahoo uses.

W3E References:

search engines
WebCrawler
worm

Print References:

"Finding Needles in a Haystack--How to use search engines to dredge up only the things you need to see" by Clay Shirky. NetGuide issue 210, p87, October 1, 1995.
"Protocol Gives Sites Way to Keep out the 'Bots' " by Jeremy Carl. Web Week vol. 1, issue 7, November 1995.
The World Wide Web Unleashed by John December and Neil Randall. Sams Publishing, Indianapolis, IN. 1995. ISBN: 0-6723-0737-5

Detail:

Some sites are using a robot exclusion protocol to exclude "robots". Developed in 1994, this protocol is a file on the server with a list of names of the user agents and the paths they use. The file can exclude certain types of robots as well as exclude robots from particular parts of a site. Sites that change frequently, such as newspaper sites, or sites that cannot handle the traffic that robots bring, have valid reasons for wishing to exclude these search engines. Major search sites, such as Open Text and Lycos, also have an interest in following the robot exclusion protocol, as some search sites use these protocols themselves to protect portions of their own sites from unwanted searches.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

E-Mail: The World Wide Web Encyclopedia at wwwe@tab.com
E-Mail: Charles River Media at chrivmedia@aol.com
Copyright 1996 Charles River Media. All rights reserved.
Text - Copyright © 1995, 1996 - James Michael Stewart & Ed Tittel.
Web Layout - Copyright © 1995, 1996 - LANWrights & IMPACT Online.
Revised -- February 20th, 1996